7 research outputs found

    Micro-architecture independent branch behavior modeling

    No full text
    In this paper, we propose linear branch entropy, a new metric for characterizing branch behavior. The metric is independent of the configuration of a specific branch predictor, but it is highly correlated with the branch miss rate of any predictor. In particular, we show that there is a linear relationship between linear branch entropy and the branch miss rate. This means that the metric can be used to estimate branch miss rates without simulating a branch predictor by constructing a linear function between entropy and miss rate. The resulting model is more accurate than previously proposed branch classification models, such as taken rate and transition rate. Furthermore, linear branch entropy can be used to analyze the branch behavior of applications, independent of specific branch predictor implementations, and the linear branch miss rate function enables comparing branch predictors on how well they perform on easy-to-predict versus hard-to-predict branches. As a case study, we find that the winner of the latest branch predictor competition performs worse on hard-to-predict branches, compared to the third runner-up; however, since the benchmark suite mainly consisted of easy branches, a predictor that performs well on easy-to-predict branches has a lower average miss rate

    Linear branch entropy : characterizing and optimizing branch behavior in a micro-architecture independent way

    No full text
    In this paper, we propose linear branch entropy, a new metric for characterizing branch behavior. Linear branch entropy is independent of the configuration of a specific branch predictor, but is highly correlated with the branch misprediction rate of any predictor. In particular, we empirically derive a linear relationship between linear branch entropy and branch misprediction rate, which enables predicting miss rates for a range of branch predictors using a single branch entropy profile. Linear branch entropy is more accurate than previously proposed branch classification models, such as taken rate and transition rate. In addition, linear branch entropy provides insight for both analyzing an application's inherent branch behavior as well as for understanding a branch predictor's performance for easy-to-predict versus hard-to-predict branches. We present several case studies, ranging from comparing state-of-the-art branch predictors to compiler optimizations. More in particular, we find that the winner of the latest branch predictor competition outperforms the runners-up on easy-to-predict branches, but performs worse on hard-to-predict branches. We also show that using linear branch entropy to guide if-conversion in compilers leads to better performance compared to standard if-conversion heuristics

    Analytical processor performance and power modeling using micro-architecture independent characteristics

    No full text
    Optimizing processors for (a) specific application(s) can substantially improve energy-efficiency. With the end of Dennard scaling, and the corresponding reduction in energy-efficiency gains from technology scaling, such approaches may become increasingly important. However, designing application-specific processors requires fast design space exploration tools to optimize for the targeted application(s). Analytical models can be a good fit for such design space exploration as they provide fast performance and power estimates and insight into the interaction between an application’s characteristics and the micro-architecture of a processor. Unfortunately, prior analytical models for superscalar out-of-order processors require micro-architecture dependent inputs, such as cache miss rates, branch miss rates and memory-level parallelism. This requires profiling the applications for each cache and branch predictor configuration of interest, which is far more time-consuming than evaluating the analytical performance models. In this work we present a micro-architecture independent profiler and associated analytical models that allow us to produce performance and power estimates across a large superscalar out-of-order processor design space almost instantaneously. We show that using a micro-architecture independent profile leads to a speedup of 300 compared to detailed simulation for our evaluated design space. Over a large design space, the model has a 9.3% average error for performance and a 4.3% average error for power, compared to detailed cycle-level simulation. The model is able to accurately determine the optimal processor configuration for different applications under power or performance constraints, and provides insight into performance through cycle stacks

    Micro-architecture independent analytical processor performance and power modeling

    No full text
    Optimizing processors for specific application(s) can substantially improve energy-efficiency. With the end of Dennard scaling, and the corresponding reduction in energyefficiency gains from technology scaling, such approaches may become increasingly important. However, designing applicationspecific processors require fast design space exploration tools to optimize for the targeted application(s). Analytical models can be a good fit for such design space exploration as they provide fast performance estimations and insight into the interaction between an application’s characteristics and the micro-architecture of a processor. Unfortunately, current analytical models require some microarchitecture dependent inputs, such as cache miss rates, branch miss rates and memory-level parallelism. This requires profiling the applications for each cache and branch predictor configuration, which is far more time-consuming than evaluating the actual performance models. In this work we present a micro-architecture independent profiler and associated analytical models that allow us to produce performance and power estimates across a large design space almost instantaneously. We show that using a micro-architecture independent profile leads to a speedup of 25% for our evaluated design space, compared to an analytical model that uses micro-architecture dependent profiles. Over a large design space, the model has a 13% error for performance and a 7% error for power, compared to cycle-level simulation. The model is able to accurately determine the optimal processor configuration for different applications under power or performance constraints, and it can provide insight into performance through cycle stacks
    corecore